Indexing Strategy & Performance Optimization
This document details the MongoDB indexing strategy and performance optimization techniques used in the notification system. It covers collection indexes, rationale for each index, query patterns for notification delivery, user management, and data aggregation. It also explains index creation syntax, how to analyze query plans using explain(), and provides guidance on index maintenance, monitoring index usage statistics, and identifying missing indexes. Best practices for index design are included, balancing write performance implications, storage overhead, and read optimization trade-offs.
The notification system is organized around five main MongoDB collections and supporting services:
Notices: stores notifications and delivery status
Jobs: structured job listings
PlacementOffers: placement offer data
Users: user subscription and preference data
OfficialPlacementData: aggregated statistics snapshots
Diagram sources
Section sources
Database connectivity and collection access are encapsulated in DBClient, which connects to the SupersetPlacement database and exposes collection handles.
DatabaseService implements CRUD and aggregation operations for notices, jobs, placement offers, users, and policies, delegating to DBClient.
NotificationRunner orchestrates sending unsent notices via Telegram and/or Web Push using NotificationService.
Services for Telegram and Web Push implement channel-specific logic and rely on DatabaseService for user and subscription data.
Key operational flows:
CLI invokes NotificationRunner → NotificationService → DatabaseService → DBClient → MongoDB collections
Indexes are created at the database level to optimize these flows
Section sources
The indexing strategy targets three primary workloads:
Notification delivery: finding unsent notices and marking them sent
User management: active user queries and subscription lookups
Data aggregation: statistics and analytics over placement and notices
Diagram sources
Notices Collection Indexes#
Purpose-built indexes:
Unique index on id for fast deduplication and existence checks
Index on sent_to_telegram for efficient unsent notice retrieval
Index on created_at for chronological sorting and recent queries
Compound index on {source, category} for filtering by source and category
Rationale:
Existence checks and insert-on-absent rely on the unique id index
Delivery pipeline filters unsent notices using sent_to_telegram
Sorting by created_at supports FIFO delivery ordering
Compound index accelerates filtering by source and category
Index creation syntax:
Unique: createIndex({ id: 1 }, { unique: true })
Delivery: createIndex({ sent_to_telegram: 1 })
Chronology: createIndex({ created_at: -1 })
Filtering: createIndex({ source: 1, category: 1 })
Query plan analysis:
Use explain(“executionStats”) on unsent notice queries to confirm index usage
Verify that projection and limits minimize returned documents
Performance tips:
Prefer filtered queries with sent_to_telegram and created_at
Use projection to limit fields when retrieving notices
Batch updates for marking sent to reduce round trips
Section sources
Jobs Collection Indexes#
Indexes:
Unique index on job_id for upsert and existence checks
Index on company for company-based queries
Index on application_deadline for deadline-sensitive filtering
Index on qualification_criteria.branches for branch-based targeting
Rationale:
job_id uniqueness prevents duplicates during upsert
company index supports filtering and reporting
application_deadline enables near-future deadlines queries
qualification_criteria.branches index targets eligible candidates
Index creation syntax:
Unique: createIndex({ job_id: 1 }, { unique: true })
Company: createIndex({ company: 1 })
Deadline: createIndex({ application_deadline: 1 })
Branches: createIndex({ “qualification_criteria.branches”: 1 })
Query patterns:
Upsert structured jobs using job_id
Filter by company and deadline for targeted notifications
Project only required fields to reduce payload
Section sources
PlacementOffers Collection Indexes#
Indexes:
Unique offer_id for deduplication and upsert
company for company-centric analytics and notifications
processing_status for workflow state queries
created_at for reverse-chronological listing
Rationale:
offer_id ensures idempotent processing of offers
company index supports company-specific alerts
processing_status optimizes workflow queries
created_at sorts newest offers first
Index creation syntax:
Unique: createIndex({ offer_id: 1 }, { unique: true })
Company: createIndex({ company: 1 })
Status: createIndex({ processing_status: 1 })
Timestamp: createIndex({ created_at: -1 })
Query patterns:
Sort by created_at descending for recent offers
Group and count students per company for analytics
Section sources
Users Collection Indexes#
Indexes:
Unique user_id for user identity and soft-deletion workflows
subscription_active for active user queries
last_active for recency-based operations
registered_at for registration-time analytics
Rationale:
user_id uniqueness underpins user management
subscription_active powers broadcast targeting
last_active supports engagement metrics and cleanup
registered_at enables cohort analysis
Index creation syntax:
Unique: createIndex({ user_id: 1 }, { unique: true })
Active: createIndex({ subscription_active: 1 })
LastActive: createIndex({ last_active: -1 })
Registered: createIndex({ registered_at: 1 })
Query patterns:
Broadcast to active users
Count active users efficiently
Retrieve user profiles by user_id
Section sources
OfficialPlacementData Collection Indexes#
Indexes:
Unique data_id for snapshot identity
timestamp descending for latest snapshot queries
Rationale:
data_id ensures idempotent snapshot writes
timestamp descending supports latest-first retrieval
Index creation syntax:
Unique: createIndex({ data_id: 1 }, { unique: true })
Latest: createIndex({ timestamp: -1 })
Query patterns:
Find latest snapshot by sorting by timestamp desc
Project only required fields for branch-wise statistics
Section sources
Notification Delivery Pipeline#
The delivery pipeline retrieves unsent notices, formats messages, and sends via Telegram and/or Web Push, then marks them sent.
Diagram sources
Section sources
The notification system’s data access layer is decoupled via DBClient and DatabaseService, enabling testability and clean separation of concerns.
Diagram sources
Section sources
Index creation syntax and usage:
Notices: createIndex({ id: 1 }, { unique: true }), createIndex({ sent_to_telegram: 1 }), createIndex({ created_at: -1 }), createIndex({ source: 1, category: 1 })
Jobs: createIndex({ job_id: 1 }, { unique: true }), createIndex({ company: 1 }), createIndex({ application_deadline: 1 }), createIndex({ “qualification_criteria.branches”: 1 })
PlacementOffers: createIndex({ offer_id: 1 }, { unique: true }), createIndex({ company: 1 }), createIndex({ processing_status: 1 }), createIndex({ created_at: -1 })
Users: createIndex({ user_id: 1 }, { unique: true }), createIndex({ subscription_active: 1 }), createIndex({ last_active: -1 }), createIndex({ registered_at: 1 })
OfficialPlacementData: createIndex({ data_id: 1 }, { unique: true }), createIndex({ timestamp: -1 })
Query plan analysis:
Use explain(“executionStats”) on unsent notice retrieval to verify index usage
Aggregate indexStats to review index effectiveness
Efficient queries and projections:
Limit unsent notice retrieval and apply projection to reduce payload
Use targeted filters on source/category and company/processing_status
Batch operations:
Insert and update notices and offers in batches to improve throughput
TTL indexes:
Consider expireAfterSeconds on temporary logs or transient collections to auto-clean
Connection pooling:
PyMongo pools connections by default; configure maxPoolSize appropriately
Caching:
Cache expensive computations like placement statistics for a short TTL
Section sources
Common issues and resolutions:
Slow unsent notice retrieval: verify sent_to_telegram and created_at indexes; add projection and limit
High write amplification: batch inserts/updates; avoid unnecessary index updates
Index growth: monitor index sizes; drop unused indexes; consolidate overlapping indexes
Query plan regressions: periodically run explain(“executionStats”); adjust indexes based on actual query patterns
Monitoring index usage:
Use $indexStats aggregation to inspect index usage frequency and size
Track slow query logs and re-index based on hotspots
Identifying missing indexes:
Review explain(“executionStats”) outputs for table scans on frequently queried fields
Correlate with query patterns from notification delivery and user management
Section sources
The indexing strategy aligns closely with the notification system’s core workloads: delivering unsent notices, managing users, and aggregating placement data. Unique indexes on identifiers prevent duplication, while targeted single-field and compound indexes accelerate common queries. Proper use of explain(), projections, and batch operations ensures optimal performance. Regular monitoring and iterative index refinement will maintain efficiency as data volumes grow.
Index Creation Syntax Reference#
Notices: createIndex({ id: 1 }, { unique: true }), createIndex({ sent_to_telegram: 1 }), createIndex({ created_at: -1 }), createIndex({ source: 1, category: 1 })
Jobs: createIndex({ job_id: 1 }, { unique: true }), createIndex({ company: 1 }), createIndex({ application_deadline: 1 }), createIndex({ “qualification_criteria.branches”: 1 })
PlacementOffers: createIndex({ offer_id: 1 }, { unique: true }), createIndex({ company: 1 }), createIndex({ processing_status: 1 }), createIndex({ created_at: -1 })
Users: createIndex({ user_id: 1 }, { unique: true }), createIndex({ subscription_active: 1 }), createIndex({ last_active: -1 }), createIndex({ registered_at: 1 })
OfficialPlacementData: createIndex({ data_id: 1 }, { unique: true }), createIndex({ timestamp: -1 })
Section sources